Search CORE

14 research outputs found

KG-Hub-building and exchanging biological knowledge graphs.

Author: Balhoff Jim
Bruskiewich Richard M
Callahan Tiffany J
Cappelletti Luca
Carbon Seth
Caufield J Harry
Chan Lauren E
Cortes Katherina
Elsarboukh Glass
Fontana Tommaso
Haendel Melissa A
Harris Nomi L
Hegde Harshad
Joachimiak Marcin P
Matentzoglu Nicolas
Moxon Sierra A T
Mungall Christopher J
Munoz-Torres Monica C
Putman Tim
Ravanmehr Vida
Reese Justin T
Robinson Peter N
Schaper Kevin
Shefchek Kent A
Thessen Anne E
Unni Deepak R
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/07/2023
Field of study

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org

The Jackson Laboratory: The Mouseion at the JAXlibrary

The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species.

Author: Alquaddoomi Faisal S
Braun Ian
Bruskiewich Richard M
Cappelletti Luca
Carbon Seth
Caron Anita R
Caufield J Harry
Chan Lauren E
Chute Christopher G
Cortes Katherina G
Cox Corey
De Souza Vinícius
Elsarboukh Glass
Fontana Tommaso
Gehrke Sarah
Haendel Melissa A
Harris Nomi L
Hartley Emily L
Hegde Harshad
Hurwitz Eric
Jacobsen Julius O B
Krishnamurthy Madan
Laraway Bryan J
Matentzoglu Nicolas
McLaughlin James A
McMurry Julie A
Moxon Sierra A T
Mullen Kathleen R
Mungall Christopher J
Munoz-Torres Monica C
O\u27Neil Shawn T
Osumi-Sutherland David
Putman Tim E
Reese Justin T
Robinson Peter N
Rubinetti Vincent P
Schaper Kevin
Shefchek Kent A
Smedley Damian
Stefancsik Ray
Toro Sabrina
Vasilevsky Nicole A
Walls Ramona L
Whetzel Patricia L
Publication venue: The Mouseion at the JAXlibrary
Publication date: 05/01/2024
Field of study

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch\u27s APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch\u27s data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch\u27s analytic tools by developing a customized plugin for OpenAI\u27s ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app

The Jackson Laboratory: The Mouseion at the JAXlibrary

A Simple Standard for Sharing Ontological Mappings (SSSOM).

Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec

The Jackson Laboratory: The Mouseion at the JAXlibrary

OntoGPT

Author: Caufield J. Harry
Emonet Vincent
Haendel Melissa A.
Harris Nomi L.
Hegde Harshad
Joachimiak Marcin P.
Kim HyeongSik
Matentzoglu Nicolas
Moxon Sierra A.T.
Mungall Christopher J.
Reese Justin T.
Robinson Peter N.
Publication venue
Publication date: 19/09/2023
Field of study

This release primarily concerns bugfixes. Thanks to all users who have provided feedback! What's Changed Add option to show prompt by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/183 Quick fix for type mismatch in embed command by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/185 Updates for pydantic 2 compatibility by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/189 Repairs for incorrect model specification by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/192 Add SPIRES logo by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/196 Improve intro based on more up-to-date main/docs/index.md by @nlharris in https://github.com/monarch-initiative/ontogpt/pull/194 Fixed bug where named arguments did not match up in recurse function. by @cmungall in https://github.com/monarch-initiative/ontogpt/pull/198 adding more prompts and example text to improve results by @diatomsRcool in https://github.com/monarch-initiative/ontogpt/pull/116 Address errors in using gene requests cache by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/203 Make kgx tsv by @hrshdhgd in https://github.com/monarch-initiative/ontogpt/pull/149 Fix #204 and the remainder of the HPOA evaluation by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/207 Run mypy through tox and address type check errors by @caufieldjh in https://github.com/monarch-initiative/ontogpt/pull/202 New Contributors @nlharris made their first contribution in https://github.com/monarch-initiative/ontogpt/pull/194 Full Changelog: https://github.com/monarch-initiative/ontogpt/compare/v0.3.1...v0.3.2If you use OntoGPT, please cite it as follows

ZENODO

Model organism data evolving in support of translational medicine.

Author: Blake Judith A
Bradford Yvonne M
Bult Carol J
Calvi Brian R
Engel Stacia R
Howe Douglas G
Kadin James A
Kaufman Thomas C
Kishore Ranjana
Laulederkind Stanley J F
Lewis Suzanna E
Moxon Sierra A T
Richardson Joel E
Smith Cynthia
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/10/2018
Field of study

Model organism databases (MODs) have been collecting and integrating biomedical research data for 30 years and were designed to meet specific needs of each model organism research community. The contributions of model organism research to understanding biological systems would be hard to overstate. Modern molecular biology methods and cost reductions in nucleotide sequencing have opened avenues for direct application of model organism research to elucidating mechanisms of human diseases. Thus, the mandate for model organism research and databases has now grown to include facilitating use of these data in translational applications. Challenges in meeting this opportunity include the distribution of research data across many databases and websites, a lack of data format standards for some data types, and sustainability of scale and cost for genomic database resources like MODs. The issues of widely distributed data and application of data standards are some of the challenges addressed by FAIR (Findable, Accessible, Interoperable, and Re-usable) data principles. The Alliance of Genome Resources is now moving to address these challenges by bringing together expertly curated research data from fly, mouse, rat, worm, yeast, zebrafish, and the Gene Ontology consortium. Centralized multi-species data access, integration, and format standardization will lower the data utilization barrier in comparative genomics and translational applications and will provide a framework in which sustainable scale and cost can be addressed. This article presents a brief historical perspective on how the Alliance model organisms are complementary and how they have already contributed to understanding the etiology of human diseases. In addition, we discuss four challenges for using data from MODs in translational applications and how the Alliance is working to address them, in part by applying FAIR data principles. Ultimately, combined data from these animal models are more powerful than the sum of the parts

The Jackson Laboratory: The Mouseion at the JAXlibrary

Model organism data evolving in support of translational medicine

Author: Blake Judith A.
Bradford Yvonne M.
Bult Carol J.
Calvi Brian R.
Engel Stacia R.
Howe Douglas G.
Kadin James A.
Kaufman Thomas C.
Kishore Ranjana
Laulederkind Stanley J. F.
Lewis Suzanna E.
Moxon Sierra A. T.
Richardson Joel E.
Smith Cynthia
Publication venue: Nature Publishing Group
Publication date: 17/09/2018
Field of study

Cross-organism analysis using InterMine.

Author: Balakrishnan Rama
Binkley Gail
Butano Daniela
Cherry Mike
Contrino Sergio
Harris Todd
Heimbach Josh
Hu Fengyuan
Kalderimis Alex
Karra Kalpana
Lyne Mike
Lyne Rachel
Micklem Gos
Motenko Howie
Moxon Sierra A. T.
Neuhauser Steven
Richardson Joel
Ruzicka Leyla
Smith Richard N.
Stein Lincoln
Sullivan Julie
Westerfield Monte
Worthey Elizabeth
Štěpán Radek
Publication venue: 'Wiley'
Publication date: 01/08/2015
Field of study

InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user-friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look-up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross-organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine-based systems described in this article are resources freely available to the scientific community

The Jackson Laboratory: The Mouseion at the JAXlibrary

PubMed Central

Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.

Author: Bada Michael
Biomedical Data Translator Consortium
Bizon Chris
Brush Matthew
Bruskiewich Richard
Caufield J Harry
Clemons Paul A
Dancik Vlado
Dumontier Michel
Fecho Karamarie
Glusman Gustavo
Hadlock Jennifer J
Haendel Melissa A
Harris Nomi L
Joshi Arpita
Moxon Sierra A T
Mungall Christopher J
Putman Tim
Qin Guangrong
Ramsey Stephen A
Shefchek Kent A
Solbrig Harold
Soman Karthik
Thessen Anne E
Unni Deepak R
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 25/03/2022
Field of study

Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these knowledge graphs (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science

arXiv.org e-Print Archive

eScholarship - University of California

Providence St. Joseph Health Digital Commons

InterMOD: integrated data and tools for the unification of model organism research.

Author: Aleksic Jelena
Balakrishnan Rama
Binkley Gail
Cherry J Michael
Harris Todd
Hitz Benjamin
Jayaraman Pushkala
Karra Kalpana
Lyne Rachel
Micklem Gos
Motenko Howie
Moxon Sierra A T
Neuhauser Steven
Pich Christian
Richardson Joel E
Smith Richard N
Stein Lincoln
Sullivan Julie
Trinh Quang
Twigger Simon
Vallejos Andrew
Westerfield Monte
Wong J D
Worthey Elizabeth
Publication venue: The Mouseion at the JAXlibrary
Publication date: 08/05/2013
Field of study

Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community. Sci Rep 2013 May 8; 3:1802

The Jackson Laboratory: The Mouseion at the JAXlibrary

PubMed Central